9 research outputs found

    Deep Exploration for Recommendation Systems

    Full text link
    Modern recommendation systems ought to benefit by probing for and learning from delayed feedback. Research has tended to focus on learning from a user's response to a single recommendation. Such work, which leverages methods of supervised and bandit learning, forgoes learning from the user's subsequent behavior. Where past work has aimed to learn from subsequent behavior, there has been a lack of effective methods for probing to elicit informative delayed feedback. Effective exploration through probing for delayed feedback becomes particularly challenging when rewards are sparse. To address this, we develop deep exploration methods for recommendation systems. In particular, we formulate recommendation as a sequential decision problem and demonstrate benefits of deep exploration over single-step exploration. Our experiments are carried out with high-fidelity industrial-grade simulators and establish large improvements over existing algorithms

    Scalable Neural Contextual Bandit for Recommender Systems

    Full text link
    High-quality recommender systems ought to deliver both innovative and relevant content through effective and exploratory interactions with users. Yet, supervised learning-based neural networks, which form the backbone of many existing recommender systems, only leverage recognized user interests, falling short when it comes to efficiently uncovering unknown user preferences. While there has been some progress with neural contextual bandit algorithms towards enabling online exploration through neural networks, their onerous computational demands hinder widespread adoption in real-world recommender systems. In this work, we propose a scalable sample-efficient neural contextual bandit algorithm for recommender systems. To do this, we design an epistemic neural network architecture, Epistemic Neural Recommendation (ENR), that enables Thompson sampling at a large scale. In two distinct large-scale experiments with real-world tasks, ENR significantly boosts click-through rates and user ratings by at least 9% and 6% respectively compared to state-of-the-art neural contextual bandit algorithms. Furthermore, it achieves equivalent performance with at least 29% fewer user interactions compared to the best-performing baseline algorithm. Remarkably, while accomplishing these improvements, ENR demands orders of magnitude fewer computational resources than neural contextual bandit baseline algorithms

    Optimism Based Exploration in Large-Scale Recommender Systems

    Full text link
    Bandit learning algorithms have been an increasingly popular design choice for recommender systems. Despite the strong interest in bandit learning from the community, there remains multiple bottlenecks that prevent many bandit learning approaches from productionalization. Two of the most important bottlenecks are scaling to multi-task and A/B testing. Classic bandit algorithms, especially those leveraging contextual information, often requires reward for uncertainty estimation, which hinders their adoptions in multi-task recommender systems. Moreover, different from supervised learning algorithms, bandit learning algorithms emphasize greatly on the data collection process through their explorative nature. Such explorative behavior induces unfair evaluation for bandit learning agents in a classic A/B test setting. In this work, we present a novel design of production bandit learning life-cycle for recommender systems, along with a novel set of metrics to measure their efficiency in user exploration. We show through large-scale production recommender system experiments and in-depth analysis that our bandit agent design improves personalization for the production recommender system and our experiment design fairly evaluates the performance of bandit learning algorithms

    Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning

    Full text link
    Auction-based recommender systems are prevalent in online advertising platforms, but they are typically optimized to allocate recommendation slots based on immediate expected return metrics, neglecting the downstream effects of recommendations on user behavior. In this study, we employ reinforcement learning to optimize for long-term return metrics in an auction-based recommender system. Utilizing temporal difference learning, a fundamental reinforcement learning algorithm, we implement an one-step policy improvement approach that biases the system towards recommendations with higher long-term user engagement metrics. This optimizes value over long horizons while maintaining compatibility with the auction framework. Our approach is grounded in dynamic programming ideas which show that our method provably improves upon the existing auction-based base policy. Through an online A/B test conducted on an auction-based recommender system which handles billions of impressions and users daily, we empirically establish that our proposed method outperforms the current production system in terms of long-term user engagement metrics

    Dietary inflammatory potential mediated gut microbiota and metabolite alterations in Crohn's disease:A fire-new perspective

    Get PDF
    Background & aims: Pro-inflammatory diet interacting with gut microbiome might trigger for Crohn's disease (CD). We aimed to investigate the relationship between dietary inflammatory potential and microflora/metabolites change and their link with CD. Methods: The dietary inflammatory potential was assessed using a dietary inflammatory index (DII) based on the Food Frequency Questionnaire from 150 new-onset CD patients and 285 healthy controls (HCs). We selected 41 CD patients and 89 HCs who had not received medication for metagenomic and targeted metabolomic sequencing to profile their gut microbial composition as well as fecal and serum metabolites. DII scores were classified into quartiles to investigate associations among different variables. Results: DII scores of CD patients were significantly higher than HCs (0.56 ± 1.20 vs 0.23 ± 1.02, p = 0.017). With adjustment for confounders, a higher DII score was significantly associated with higher risk of CD (OR: 1.420; 95% CI: 1.049, 1.923, p = 0.023). DII score also was positively correlated with disease activity (p = 0.001). Morganella morganii and Veillonella parvula were increased while Coprococcus eutactus was decreased in the pro-inflammatory diets group, as well as in CD. DII-related bacteria were associated with disease activity and inflammatory markers in CD patients. Among the metabolic change, pro-inflammatory diet induced metabolites change were largely involved in amino acid metabolic pathways that were also observed in CD. Conclusions: Pro-inflammatory diet might be associated with increased risk and disease activity of CD. Diet with high DII potentially involves in CD by mediating alterations in gut microbiota and metabolites

    Two-tiered Online Optimization of Region-wide Datacenter Resource Allocation via Deep Reinforcement Learning

    Full text link
    This paper addresses the important need for advanced techniques in continuously allocating workloads on shared infrastructures in data centers, a problem arising due to the growing popularity and scale of cloud computing. It particularly emphasizes the scarcity of research ensuring guaranteed capacity in capacity reservations during large-scale failures. To tackle these issues, the paper presents scalable solutions for resource management. It builds on the prior establishment of capacity reservation in cluster management systems and the two-level resource allocation problem addressed by the Resource Allowance System (RAS). Recognizing the limitations of Mixed Integer Linear Programming (MILP) for server assignment in a dynamic environment, this paper proposes the use of Deep Reinforcement Learning (DRL), which has been successful in achieving long-term optimal results for time-varying systems. A novel two-level design that utilizes a DRL-based algorithm is introduced to solve optimal server-to-reservation assignment, taking into account of fault tolerance, server movement minimization, and network affinity requirements due to the impracticality of directly applying DRL algorithms to large-scale instances with millions of decision variables. The paper explores the interconnection of these levels and the benefits of such an approach for achieving long-term optimal results in the context of large-scale cloud systems. We further show in the experiment section that our two-level DRL approach outperforms the MIP solver and heuristic approaches and exhibits significantly reduced computation time compared to the MIP solver. Specifically, our two-level DRL approach performs 15% better than the MIP solver on minimizing the overall cost. Also, it uses only 26 seconds to execute 30 rounds of decision making, while the MIP solver needs nearly an hour
    corecore